Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associate a trailing end-of-line comment in a parenthesized implicit concatenated string with the last literal #15378

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jan 9, 2025

Summary

This PR implements a fix for the implicit concatenated string formatting. Specifically for strings where the last part has a trailing end-of-line comment and they're in the value position of an assignment (including type alias) or return statement.

For example:

self._attr_unique_id = (
    f"{self._device.temperature.group_address_state}_"
    f"{self._device.target_temperature.group_address_state}_"
    f"{self._device.target_temperature.group_address}_"
    f"{self._device._setpoint_shift.group_address}"  # noqa: SLF001
)

got reformatted to

self._attr_unique_id = (
    f"{self._device.temperature.group_address_state}_"
    f"{self._device.target_temperature.group_address_state}_"
    f"{self._device.target_temperature.group_address}_"
    f"{self._device._setpoint_shift.group_address}"
)  # noqa: SLF001

which is undesired because the noqa comment is now misplaced. It's also impossible to move it back to the right place because the formatter keeps moving it out of the parentheses.

The reason this is happening is because we try to match Black's formatting for assignments:

a = (
	"a" "b"  # comment
)

# Get's formatted to 
a = "ab"  # comment

Notice how the comment gets moved out.

The relevant logic for this behavior lives here

if let Some(flat) = format_implicit_flat {
inline_comments.mark_formatted();
let string = flat.string();
let flat = format_with(|f| {
if string.is_fstring() {
let mut buffer = RemoveSoftLinesBuffer::new(&mut *f);
write!(buffer, [flat])
} else {
flat.fmt(f)
}
})
.memoized();
// F-String containing an expression with a magic trailing comma, a comment, or a
// multiline debug expression should never be joined. Use the default layout.
// ```python
// aaaa = f"abcd{[
// 1,
// 2,
// ]}" "more"
// ```
if string.is_fstring() && flat.inspect(f)?.will_break() {
inline_comments.mark_unformatted();
return write!(
f,
[maybe_parenthesize_expression(
value,
*statement,
Parenthesize::IfBreaks,
)]
);
}
let expanded = format_with(|f| {
let f =
&mut WithNodeLevel::new(NodeLevel::Expression(Some(group_id)), f);
write!(
f,
[FormatImplicitConcatenatedStringExpanded::new(
string,
ImplicitConcatenatedLayout::MaybeFlat
)]
)
});
// Join the implicit concatenated string if it fits on a single line
// ```python
// a = "testmorelong" # comment
// ```
let single_line = format_with(|f| write!(f, [flat, inline_comments]));
// Parenthesize the string but join the implicit concatenated string and inline the comment.
// ```python
// a = (
// "testmorelong" # comment
// )
// ```
let joined_parenthesized = format_with(|f| {
group(&format_args![
token("("),
soft_block_indent(&format_args![flat, inline_comments]),
token(")"),
])
.with_group_id(Some(group_id))
.should_expand(true)
.fmt(f)
});
// Keep the implicit concatenated string multiline and don't inline the comment.
// ```python
// a = (
// "test"
// "more"
// "long"
// ) # comment
// ```
let implicit_expanded = format_with(|f| {
group(&format_args![
token("("),
block_indent(&expanded),
token(")"),
inline_comments,
])
.with_group_id(Some(group_id))
.should_expand(true)
.fmt(f)
});
// We can't use `optional_parentheses` here because the `inline_comments` contains
// a `expand_parent` which results in an instability because the next format
// collapses the parentheses.
// We can't use `parenthesize_if_expands` because it defaults to
// the *flat* layout when the expanded layout doesn't fit.
best_fitting![single_line, joined_parenthesized, implicit_expanded]
.with_mode(BestFittingMode::AllLines)
.fmt(f)?;
} else if let Some(format_f_string) = format_f_string {

and inline_comments stores all the comments that should either be moved in or outside the parentheses.

Ideally, we would adjust this logic (and it's counterpart in

} else if let Some(format_implicit_flat) = &format_implicit_flat {
// F-String containing an expression with a magic trailing comma, a comment, or a
// multiline debug expression should never be joined. Use the default layout.
//
// ```python
// aaaa = f"abcd{[
// 1,
// 2,
// ]}" "more"
// ```
if format_implicit_flat.string().is_fstring()
&& format_value.inspect(f)?.will_break()
{
inline_comments.mark_unformatted();
return write!(
f,
[
before_operator,
space(),
operator,
space(),
maybe_parenthesize_expression(
value,
*statement,
Parenthesize::IfBreaks
)
]
);
}
let group_id = f.group_id("optional_parentheses");
let format_expanded = format_with(|f| {
let f = &mut WithNodeLevel::new(NodeLevel::Expression(Some(group_id)), f);
FormatImplicitConcatenatedStringExpanded::new(
StringLike::try_from(*value).unwrap(),
ImplicitConcatenatedLayout::MaybeFlat,
)
.fmt(f)
})
.memoized();
// Keep the target flat, parenthesize the value, and keep it multiline.
//
// ```python
// Literal[ "a", "b"] = (
// "looooooooooooooooooooooooooooooong"
// "string"
// ) # comment
// ```
let flat_target_value_parenthesized_multiline = format_with(|f| {
write!(
f,
[
last_target,
space(),
operator,
space(),
token("("),
group(&soft_block_indent(&format_expanded))
.with_group_id(Some(group_id))
.should_expand(true),
token(")"),
inline_comments
]
)
});
// Expand the parent and parenthesize the joined string with the inlined comment.
//
// ```python
// Literal[
// "a",
// "b",
// ] = (
// "not that long string" # comment
// )
// ```
let split_target_value_parenthesized_flat = format_with(|f| {
write!(
f,
[
group(&last_target).should_expand(true),
space(),
operator,
space(),
token("("),
group(&soft_block_indent(&format_args![
format_value,
inline_comments
]))
.should_expand(true),
token(")")
]
)
});
// The most expanded variant: Expand both the target and the string.
//
// ```python
// Literal[
// "a",
// "b",
// ] = (
// "looooooooooooooooooooooooooooooong"
// "string"
// ) # comment
// ```
let split_target_value_parenthesized_multiline = format_with(|f| {
write!(
f,
[
group(&last_target).should_expand(true),
space(),
operator,
space(),
token("("),
group(&soft_block_indent(&format_expanded))
.with_group_id(Some(group_id))
.should_expand(true),
token(")"),
inline_comments
]
)
});
// This is only a perf optimisation. No point in trying all the "flat-target"
// variants if we know that the last target must break.
if last_target_breaks {
best_fitting![
split_target_flat_value,
split_target_value_parenthesized_flat,
split_target_value_parenthesized_multiline,
]
.with_mode(BestFittingMode::AllLines)
.fmt(f)
} else {
best_fitting![
single_line,
flat_target_parenthesize_value,
flat_target_value_parenthesized_multiline,
split_target_flat_value,
split_target_value_parenthesized_flat,
split_target_value_parenthesized_multiline,
]
.with_mode(BestFittingMode::AllLines)
.fmt(f)
}
} else if let Some(format_f_string) = &format_f_string {
) but this is going to be fairly involved and is something I consider too risky for a bugfix release.

That's why I decided to instead implement it in place_comment and associate a comment in a parenthesized f-string expression that's on the same line as the last part with said part, but only if the last part is on its own line.

This has two downsides:

  1. Ideally, we'd collapse the following but we'll no longer be able to do that with the changes from this PR:

    a = (
        "a"
        "b" # comment
    )
    
    # could be formatted to
    a = "ab" # comment

    This used to work before and works for implicit concatenated strings outside of assignment-value positions. I think this is acceptable.

  2. This is directly related to the above limitation. I made this behavior specific to implicit concatenated strings in assignment value positions or return statements. That means that the behavior is somewhat inconsistent to how we e.g. handle implicit concatenated strings embedded in another expression (you can see an example in the snapshot changes)

I'm open to changing the behavior to all implicit concatenated strings. It might be the right move, but I'm not sure. There are trade offs ;)

Fixes #15375

Test Plan

Added snapshots

The ecosystem changes look good to me.

@MichaReiser MichaReiser added bug Something isn't working formatter Related to the formatter labels Jan 9, 2025
@MichaReiser

This comment was marked as outdated.

Copy link
Contributor

github-actions bot commented Jan 9, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check detected format changes. (+12 -12 lines in 6 files in 2 projects; 1 project error; 52 projects unchanged)

langchain-ai/langchain (+2 -2 lines across 1 file)

libs/core/tests/unit_tests/utils/test_html.py~L166

         '<a href="https://foobar.com:9999">BAD</a>'
         '<a href="http://foobar.com:9999/">BAD</a>'
         '<a href="https://foobar.com/OK">OK</a>'
-        '<a href="http://foobar.com/BAD">BAD</a>'
-    )  # Change in scheme is not OK here
+        '<a href="http://foobar.com/BAD">BAD</a>'  # Change in scheme is not OK here
+    )
 
     expected = sorted(
         [

rotki/rotki (+10 -10 lines across 5 files)

rotkehlchen/db/history_events.py~L362

         free_base_suffix = (
             f"* FROM (SELECT {base_suffix}) WHERE event_identifier IN ("
             f"SELECT DISTINCT event_identifier FROM history_events {ignore_asset_filter} "
-            "ORDER BY timestamp DESC,sequence_index ASC LIMIT ?)"
-        )  # free query only select the last LIMIT groups  # noqa: E501
+            "ORDER BY timestamp DESC,sequence_index ASC LIMIT ?)"  # free query only select the last LIMIT groups  # noqa: E501
+        )
 
         if has_premium:
             suffix, limit = premium_base_suffix, []

rotkehlchen/db/upgrades/v42_v43.py~L83

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v43_v44.py~L222

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v44_v45.py~L46

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v45_v46.py~L187

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Formatter (preview)

ℹ️ ecosystem check detected format changes. (+12 -12 lines in 6 files in 2 projects; 1 project error; 52 projects unchanged)

langchain-ai/langchain (+2 -2 lines across 1 file)

ruff format --preview

libs/core/tests/unit_tests/utils/test_html.py~L158

         '<a href="https://foobar.com:9999">BAD</a>'
         '<a href="http://foobar.com:9999/">BAD</a>'
         '<a href="https://foobar.com/OK">OK</a>'
-        '<a href="http://foobar.com/BAD">BAD</a>'
-    )  # Change in scheme is not OK here
+        '<a href="http://foobar.com/BAD">BAD</a>'  # Change in scheme is not OK here
+    )
 
     expected = sorted([
         "https://foobar.com/OK",

rotki/rotki (+10 -10 lines across 5 files)

ruff format --preview

rotkehlchen/db/history_events.py~L362

         free_base_suffix = (
             f"* FROM (SELECT {base_suffix}) WHERE event_identifier IN ("
             f"SELECT DISTINCT event_identifier FROM history_events {ignore_asset_filter} "
-            "ORDER BY timestamp DESC,sequence_index ASC LIMIT ?)"
-        )  # free query only select the last LIMIT groups  # noqa: E501
+            "ORDER BY timestamp DESC,sequence_index ASC LIMIT ?)"  # free query only select the last LIMIT groups  # noqa: E501
+        )
 
         if has_premium:
             suffix, limit = premium_base_suffix, []

rotkehlchen/db/upgrades/v42_v43.py~L83

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v43_v44.py~L220

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v44_v45.py~L46

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

rotkehlchen/db/upgrades/v45_v46.py~L187

                 "DELETE FROM history_events WHERE identifier IN ("
                 "SELECT H.identifier from history_events H INNER JOIN evm_events_info E "
                 "ON H.identifier=E.identifier AND E.tx_hash IN "
-                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"
-            )  # location 'o' is zksync lite  # noqa: E501
+                "(SELECT tx_hash FROM evm_transactions) AND H.location != 'o')"  # location 'o' is zksync lite  # noqa: E501
+            )
             bindings: tuple = ()
             if customized_events != 0:
                 querystr += " AND identifier NOT IN (SELECT parent_identifier FROM history_events_mappings WHERE name=? AND value=?)"  # noqa: E501

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

@MichaReiser MichaReiser force-pushed the micha/fix-implicit-concat-string-trailing-part-comment branch from 4070d80 to eef2e67 Compare January 10, 2025 09:40
@MichaReiser MichaReiser force-pushed the micha/fix-implicit-concat-string-trailing-part-comment branch from eef2e67 to 3d612d1 Compare January 10, 2025 09:41
@MichaReiser MichaReiser force-pushed the micha/fix-implicit-concat-string-trailing-part-comment branch from 881b881 to bde498b Compare January 10, 2025 10:23
@MichaReiser MichaReiser marked this pull request as ready for review January 10, 2025 10:24
@MichaReiser MichaReiser requested review from charliermarsh and dhruvmanila and removed request for charliermarsh January 10, 2025 10:24
Comment on lines 343 to 346
a = (
"a"
"b" "c" # belongs to the f-string expression
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the change but I want to confirm as to why does this comment belongs to the f-string expression. I think the reason is that without that, the formatting might move the last part of the string to it's own line along with the comment which means the comment won't be applicable for the last second part. Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an edge case where there's no "right" solution because the formatter can't tell if the comment belongs to the middle part, the last part, or the entire f-string.

That's why I decided to associate it with the f-string. This matches the default behavior where we associate a comment with the outer most node that ends right before it. We could also decide to associate it with the last part, if we think this is more appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems like a reasonable default. It might be worth adding this as a comment to that case.

@@ -355,6 +355,41 @@ fn handle_enclosed_comment<'a>(
AnyNodeRef::ExprGenerator(generator) if generator.parenthesized => {
handle_bracketed_end_of_line_comment(comment, source)
}
AnyNodeRef::StmtReturn(_) => {
Copy link
Member

@dhruvmanila dhruvmanila Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cover other statements like raise and assert?

raise Exception(
    "a"
    "b"  # belongs to `b`
)

assert False, (
    "a"
    "b"  # belongs to `b`
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only statements that use FormatStatementsLastExpression

@MichaReiser MichaReiser changed the title Keep trailing part comment associated with its part Associate a trailing end-of-line comment in a parenthesized implicit concatenated string with the last literal Jan 10, 2025
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a wise behavior change. Is the intent that we'd eventually do the thing that you called "too risky for a bugfix release", so that we can collapse these in some cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implicit concatenated f-string: Trailing end-of-line comment is moved
3 participants